Robust CNN-based speech recognition with Gabor filter kernels
نویسندگان
چکیده
As has been extensively shown, acoustic features for speech recognition can be learned from neural networks with multiple hidden layers. However, the learned transformations may not sufficiently generalize to test sets that have a significant mismatch to the training data. Gabor features, on the other hand, are generated from spectro-temporal filters designed to model human auditory processing. In previous work, these features are used as inputs to neural networks, which improved word accuracy for speech recognition in the presence of noise. Here we propose a neural network architecture called a Gabor Convolutional Neural Network (GCNN) that incorporates Gabor functions into convolutional filter kernels. In this architecture, a variety of Gabor features served as the multiple feature maps of the convolutional layer. The filter coefficients are further tuned by back-propagation training. Experiments used two noisy versions of the WSJ corpus: Aurora 4, and RATS re-noised WSJ. In both cases, the proposed architecture performs better than other noise-robust features that we have tried, namely, ETSI-AFE, PNCC, Gabor features without the CNN-based approach, and our best neural network features that don’t incorporate Gabor functions.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملHandwritten Character Recognition Using CNN Gabor-Type Filters
This paper proposes an approach for handwritten character recognition using nonlinear normalisation, a CNN Gabor-Type filter, a Location Based Dominant Orientation Map and cross correlation. Based on a test set of 26 test characters acting as template and a set consisting of 4 sets of 26 unknown handwritten test characters, max. 92 % correct recognition is provided. Recognition rate is studied ...
متن کاملSeparable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.
To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor...
متن کاملFeeding Hand-Crafted Features for Enhancing the Performance of Convolutional Neural Networks
Since the convolutional neural network (CNN) is believed to find right features for a given problem, the study of hand-crafted features is somewhat neglected these days. In this paper, we show that finding an appropriate feature for the given problem may be still important as they can enhance the performance of CNN-based algorithms. Specifically, we show that feeding an appropriate feature to t...
متن کامل